CGN to Grail: Extracting a Type-logical Lexicon From the CGN Annotation
نویسندگان
چکیده
The tag set for the CGN syntactic annotation is designed in such a way as to enable a transparent mapping to the derivational structures of current ‘lexicalized’ grammar formalisms. Through such translations, the CGN tree bank can be used to train and evaluate computational grammars within these frameworks. In this paper we will discuss some preliminary work on the mapping between the CGN annotation graphs and the proof net format of the Grail parser/theorem prover (Moot 2001, Moot 1999). Grail is a general grammar development environment for typelogical categorial grammars (TLG, (Moortgat 1997, Morrill 1994, Carpenter 1998)). To a large extent, there is a straightforward transfer between the type-logical format and the analyses provided by other lexicalized grammar formalisms such as LTAG (lexicalized Tree Adjoining Grammars, (Sarkar 2001)) and MG (computational versions of Minimalist Grammars, (Stabler 1997)). An attractive feature of TLG, which is not shared by these other frameworks, is its full support for hypothetical reasoning. In this paper, we exploit the hypothetical reasoning facilities to extract a type-logical grammar from the CGN annotation graphs. This task can be naturally divided in two subtasks. The first of these consists in solving type equations: in the TLG setting this means breaking up the CGN annotation graph into the subgraphs that correspond to lexical type assignments. In the presence of discontinuous dependencies, the lexical type assignments will not always be compatible with surface word order. The second subtask then consists in calibrating the lexicon in such a way that it has controlled access to the structural reasoning component of the grammar.
منابع مشابه
Using the Spoken Dutch Corpus for type-logical grammar induction
Abstract The dependency-based annotation format employed within the Spoken Dutch Corpus (CGN) project (van der Wouden et al., 2002) has been designed in such a way as to enable a transparent mapping to the derivational structures of current ‘lexicalized’ grammar formalisms. Through such translations, the CGN tree bank can be used to train and evaluate computational grammars within these framewo...
متن کاملCarrageenan induces cell cycle arrest in human intestinal epithelial cells in vitro.
Multiple studies in animal models have shown that the commonly used food additive carrageenan (CGN) induces inflammation and intestinal neoplasia. We performed the first studies to determine the effects of CGN exposure on human intestinal epithelial cells (IEC) in tissue culture and tested the effect of very low concentrations (1-10 mg/L) of undegraded, high-molecular weight CGN. These concentr...
متن کاملSyntactic Annotation for the Spoken Dutch Corpus Project (CGN)
Of the ten million words of contemporary standard Dutch in the Spoken Dutch Corpus (Corpus Gesproken Nederlands, CGN), a selection of one million words of natural spoken language will be annotated syntactically. In the present paper we discuss the tag sets and the annotation procedures that are currently being developed and tested. The annotation tags provide information about syntactic constit...
متن کاملDraft October 2003 3 From recent overviews of annotated
This chapter describes the broad phonemic transcription in the CGN. First a broad overview of phonetic annotations in Dutch corpora is provided and a number of crucial dimensions are discussed: the source of annotation (human or automatic), the type of material involved, the level of transcription and the symbol set and transcription conventions. These dimensions serve as a guide through a numb...
متن کاملSyntactic Analysis in the Spoken Dutch Corpus (CGN)
The paper describes the syntactic annotation of the Spoken Dutch Corpus (“Corpus Gesproken Nederlands” or CGN), the Dutch-Flemish project (1998-2003) aiming at the collection, description and annotation of ten million words of spoken Dutch. In the first part, the background of the parsing strategy is discussed, as well as some details concerning the actual implementation of the parsing process....
متن کامل